-
-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pose context length ext #1567
Pose context length ext #1567
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure your PR doesn't quite work for chunks > 2.
src/axolotl/utils/trainer.py
Outdated
i for i, token_id in enumerate(input_ids) if token_id in split_on_token_ids | ||
] | ||
else: | ||
split_indices = [sample_len // chunks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is going to work for any n_chunks > 2 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, you're right
* PoSE wip * fixes for pose splitting * set pose context len so we can pick that up seperately from the usable training context len * support min sample len and define num chunks * fix chunk splitting * support for curriculum/ordered learning with pose * fix sequence len sort * add curriculum_sampling to pydantic
PoSE paper: https://huggingface.co/papers/2309.10400
Model: https://huggingface.co/winglian/Llama-3-8b-64k-PoSE
YAML: https://huggingface.co/winglian/Llama-3-8b-64k-PoSE/blob/main/axolotl/pose.yaml
Add the PoSE technique for extending context length without needing long context data.